Game of Thrones Ratings Per Episode Per Season

1 Background and Research Question

“How have the ratings of Game of Thrones episodes evolved over time across different seasons?”

This research question will aim to explore trends in the data such as whether ratings increased or decreased over the course of the show, or whether there were any significant drops after key plot developments, for example.

2 Data Source

The raw data was obtained from the IMDB website which is publicly accessible via this link:

https://www.imdb.com/title/tt0944947/ratings/

The data used in this analysis was extracted from the ‘Ratings by episode’ section of the Game of Thrones page.

I created an excel spreadsheet which perfectly replicated the grid shown on the IMDB page showing each episode rating, making the data accessible for wwrangling and visualisation.

3 Data Preparation

3.1 Package Versions

Library and Version Purpose
tidyverse_2.0.0 for handling data
here_1.0.1 for easy file and directory referencing
readxl_1.4.3 for reading excel files
knitr_1.49 for combining R code with text to create dynamic reports
dplyr_1.1.4 for data manipulation tasks
jpeg_0.10.10 for reading jpeg files
gganimate_1.0.9 for creating animated plots

3.2 Load Packages

library(tidyverse)
library(here)
library(readxl)
library(knitr)
library(dplyr)
library(jpeg)
library(gganimate)

3.3 Import the Data

#load data from excel file
rawdata <- read_excel(here::here("raw_data", "raw_data.xlsx"))
## New names:
## • `` -> `...1`

R has automatically assigned any empty values in the table to now say “…1”. First, I am going to print the data to see how it looks initially after importing.

#this is a sanity check to inspect the data
print(rawdata)
## # A tibble: 8 × 12
##   ...1     e1    e2    e3    e4    e5    e6    e7    e8    e9   e10   e11
##   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 s1      8.9   8.6   8.5   8.6   9     9.1   9.1   8.9   9.6   9.4   9.2
## 2 s2      8.6   8.3   8.7   8.6   8.6   8.9   8.8   8.6   9.7   9.3  NA  
## 3 s3      8.6   8.4   8.7   9.5   8.9   8.7   8.6   8.9   9.9   9.1  NA  
## 4 s4      9     9.7   8.7   8.7   8.6   9.7   9     9.7   9.6   9.7  NA  
## 5 s5      8.3   8.3   8.3   8.5   8.5   7.9   8.8   9.8   9.4   9.1  NA  
## 6 s6      8.4   9.2   8.6   9     9.7   8.3   8.5   8.3   9.9   9.9  NA  
## 7 s7      8.5   8.8   9.1   9.7   8.7   9     9.4  NA    NA    NA    NA  
## 8 s8      7.6   7.9   7.5   5.5   5.9   4    NA    NA    NA    NA    NA

The data has successfully imported, the next step is to wrangle the data to convert it into a form that can be easily visualised.

4 Data Wrangling

4.1 Renaming the columns

#rename the first column after automatic assignment of "...1"
colnames(rawdata) <- ifelse(colnames(rawdata) == "...1", "Season", colnames(rawdata))

#rename the columns, excluding the first which I have just renamed to remain empty
colnames(rawdata)[-1] <- c("Episode 1", "Episode 2", "Episode 3", "Episode 4", "Episode 5", "Episode 6", "Episode 7", "Episode 8", "Episode 9", "Episode 10", "Episode 11")

#this is a sanity to check to make sure the column headers changed
head(rawdata, n = 1)
## # A tibble: 1 × 12
##   Season `Episode 1` `Episode 2` `Episode 3` `Episode 4` `Episode 5` `Episode 6`
##   <chr>        <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
## 1 s1             8.9         8.6         8.5         8.6           9         9.1
## # ℹ 5 more variables: `Episode 7` <dbl>, `Episode 8` <dbl>, `Episode 9` <dbl>,
## #   `Episode 10` <dbl>, `Episode 11` <dbl>
#change the values in the first column
#removing the 's' just to clean up to view of the table
rawdata$Season <- sub("s", "", rawdata$Season) 

#render the table with kable
kable(rawdata, format = "markdown")
Season Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 Episode 8 Episode 9 Episode 10 Episode 11
1 8.9 8.6 8.5 8.6 9.0 9.1 9.1 8.9 9.6 9.4 9.2
2 8.6 8.3 8.7 8.6 8.6 8.9 8.8 8.6 9.7 9.3 NA
3 8.6 8.4 8.7 9.5 8.9 8.7 8.6 8.9 9.9 9.1 NA
4 9.0 9.7 8.7 8.7 8.6 9.7 9.0 9.7 9.6 9.7 NA
5 8.3 8.3 8.3 8.5 8.5 7.9 8.8 9.8 9.4 9.1 NA
6 8.4 9.2 8.6 9.0 9.7 8.3 8.5 8.3 9.9 9.9 NA
7 8.5 8.8 9.1 9.7 8.7 9.0 9.4 NA NA NA NA
8 7.6 7.9 7.5 5.5 5.9 4.0 NA NA NA NA NA

The above table shows a much cleaner version of the data, however, it is not ready for visualisation yet. Before I take the data and plot it, first I am going to remove the final column containing ‘Episode 11’. The reason for this is that this data is not required in the analysis as this is the unaired original pilot. Audiences never saw this episode and was simply an alternate to the official pilot episode that was released. Therefore, ‘Episode 11’ was excluded from the final dataset.

4.2 Reshaping the data

# Reshape the data to long format for a more flexible structure for visualizing, analyzing, and modeling data. This is easier for ggplot2 to handle.
rawdata_long <- rawdata %>%
  pivot_longer(cols = starts_with("Episode"),  # Select columns that start with "Episode"
               names_to = "Episode",           # Create a new column "Episode"
               values_to = "Rating")          # Create a new column "Rating"

# Exclude Episode 11 from the data
data <- rawdata_long %>%
  filter(str_replace(Episode, "Episode ", "") != "11")

#this is a sanity check to make sure the data is now in a long format
head(data)
## # A tibble: 6 × 3
##   Season Episode   Rating
##   <chr>  <chr>      <dbl>
## 1 1      Episode 1    8.9
## 2 1      Episode 2    8.6
## 3 1      Episode 3    8.5
## 4 1      Episode 4    8.6
## 5 1      Episode 5    9  
## 6 1      Episode 6    9.1
# Save the data as a CSV file, which can be opened in excel.
write.csv(data, "data/data.csv", row.names = FALSE)

5 Visualisations

5.1 Basic Visualisation

#Create a basic line plot with minimal customisation
p <- ggplot(data, aes(x = as.integer(str_replace(Episode, "Episode ", "")),  # Convert episode to numeric
                         y = Rating, 
                         color = factor(Season))) +  # Use Season for different lines
  geom_line() +                        # Draw lines
  geom_point() +                       # Add points for each episode
  labs(x = "Episode Number",            # Label for x-axis
       y = "Episode Rating",            # Label for y-axis
       color = "Season Number") +       # Label for the legend
  theme_minimal() +                    # Use a minimal theme for a clean look
  scale_color_viridis_d() +             # Add color scale for different lines
  theme(legend.position = "right")      # Place the legend at the right

#view the plot as a sanity check to assess what direction to take the customisations.
print(p)

5.2 Customisation

Write something here about recoding the episode column to be able to change the scale

# Preprocess the Episode column to extract numeric episode numbers. It removes the "Episode " part of the string and converts the remaining number (e.g., "1", "2") into an integer; creating a new column 'EpisodeNumber'
data1 <- data %>%
  mutate(EpisodeNumber = as.integer(str_replace(Episode, "Episode ", "")))

# This is a sanity check to view the new column
print(data1)
## # A tibble: 80 × 4
##    Season Episode    Rating EpisodeNumber
##    <chr>  <chr>       <dbl>         <int>
##  1 1      Episode 1     8.9             1
##  2 1      Episode 2     8.6             2
##  3 1      Episode 3     8.5             3
##  4 1      Episode 4     8.6             4
##  5 1      Episode 5     9               5
##  6 1      Episode 6     9.1             6
##  7 1      Episode 7     9.1             7
##  8 1      Episode 8     8.9             8
##  9 1      Episode 9     9.6             9
## 10 1      Episode 10    9.4            10
## # ℹ 70 more rows

can write something here about customising the colours

# Convert the numeric 'Season' column to a factor with appropriate labels
data1$Season <- factor(data1$Season, 
                       levels = 1:8, 
                       labels = c("Season 1", "Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8"))

# Assign custom colors to each line based on the season
custom_colors <- c(
  "Season 1" = "#7f7f7f",   # Grey for Season 1, House Stark
  "Season 2" = "#ffc406",   # Yellow for Season 2, House Baratheon
  "Season 3" = "#006400",   # Green for Season 3, House Tyrell
  "Season 4" = "#ED7014",   # Orange for Season 4, House Martel
  "Season 5" = "#B03060",   # Maroon for Season 5, House Lannister
  "Season 6" = "#023E8A",   # Blue for Season 6, House Arryn
  "Season 7" = "#000000",   # Black for Season 7, House Greyjoy
  "Season 8" = "#ff0000"    # Red for Season 8, House Targaryen
)

Season 1 : #7f7f7f

Season 2 : #ffc406

Season 3 : #006400

Season 4 : #ED7014

Season 5 : #B03060

Season 6 : #023E8A

Season 7 : #000000

Season 8 : #ff0000

5.3 Customised Visualisation

# Create the plot with new customisations
p1 <- ggplot(data1, aes(x = EpisodeNumber, y = Rating, color = factor(Season))) +  
  geom_line() +                        
  geom_point() +                       
  labs(x = "Episode Number",            
       y = "Episode Rating",            
       color = "",                     # Label for the legend
       caption = "Source: IMDB.com") +  # Add source text at the bottom
  ggtitle("Game of Thrones Episode Ratings Per Season") + # Add a title
  theme_minimal() +  # Clean, minimal theme
  scale_color_manual(values = custom_colors) +  # Apply custom colors for lines
  scale_x_continuous(breaks = seq(1, max(data1$EpisodeNumber), by = 1)) +  # Set x-axis breaks
  scale_y_continuous(
    breaks = seq(4, 10, by = 0.5),  # Set y-axis breaks
    limits = c(4, 10),              # Set y-axis limits
    expand = c(0, 0)                # Remove extra padding
  ) + 
  theme(legend.position = "right") +  # Position the legend on the right
  guides(color = guide_legend(
    keywidth = 2,  # Adjust the size of the legend key (box around the color circle)
    keyheight = 2, # Adjust the size of the legend key (box around the color circle)
    override.aes = list(size = 5)  # Increase the size of the color circles inside the legend
  ))

# Display the plot
print(p1)

6 Animated Visualisations

6.1 Animated Visualisation 1

anim <- p1 + 
  geom_point() +
  transition_manual(EpisodeNumber, cumulative = TRUE) +
  labs(
    subtitle = "Episode: {frame}"  # Add a dynamic subtitle that changes with each frame
  )

print(anim)

6.2 Animated Visualisation 2

anim2 <- p1 + 
  geom_point() +
  transition_reveal(EpisodeNumber) +
  labs(
    subtitle = "Episode: {frame_along}"  # Ensure the subtitle is in line with the x axis
  )

print(anim2)